Semiconductor yield management system and method

ABSTRACT

A system and method for yield management are disclosed wherein a data set containing one or more prediction variable values and one or more response variable values is input into the system. The system can process the input data set to remove prediction variables with missing values and data sets with missing values based on a tiered splitting method to maximize usage of all valid data points. The processed data can then be used to generate a model that may be a decision tree. The system can accept user input to modify the generated model. Once the model is complete, one or more statistical analysis tools can be used to analyze the data and generate a list of the key yield factors for the particular data set.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to a system and method formanaging a semiconductor manufacturing process and, more particularly,to a system and method for managing yield in a semiconductor fabricationprocess.

2. Description of the Prior Art

The semiconductor manufacturing industry is continually evolving itsfabrication processes and developing new processes to produce smallerand smaller geometries of the semiconductor devices being manufactured,because smaller devices typically generate less heat and operate athigher speeds than larger devices. Currently, a single integratedcircuit chip may contain over one billion patterns. Consequently,semiconductor fabrication processes are extremely complex, and hundredsof processing steps may be involved. The occurrence of a mistake orsmall error at any of the process steps or tool specifications may causelower yield in the final semiconductor product, where yield may bedefined as the number of functional devices produced by the process ascompared to the theoretical number of devices that could be producedassuming no bad devices.

Improving yield is a critical problem in the semiconductor manufacturingindustry and has a direct economic impact on it. In particular, a higheryield translates into more devices that may be sold by the manufacturer,and, hence, greater profits.

Typically, semiconductor manufacturers collect data about varioussemiconductor fabrication process parameters and analyze the data and,based on data analysis, adjust process steps or tool specifications inan attempt to improve the yield of the process. Today, the explosivegrowth of database technology has facilitated the yield analyses thateach manufacturer performs. In particular, the database technology hasfar outpaced the yield management analysis capability when usingconventional statistical techniques to interpret and relate yield tomajor yield factors. This has created a need for a new generation oftools and techniques for automated and intelligent database analysis forsemiconductor yield management.

Many conventional yield management systems have a number of limitationsand disadvantages which make them undesirable to the semiconductormanufacturing industry. For example, conventional systems may requiresome manual processing which slows the analysis and makes the systemsusceptible to human error. In addition, these conventional systems maynot handle both continuous (e.g., temperature) and categorical (e.g.,Lot 1, Lot 2, etc.) yield management variables. Some conventionalsystems cannot handle missing data elements and do not permit rapidsearching through hundreds of yield parameters to identify key yieldfactors. Some conventional systems output data that is difficult tounderstand or interpret even by knowledgeable semiconductor yieldmanagement personnel. In addition, conventional systems typicallyprocess each yield parameter separately, which is time consuming andcumbersome and cannot identify more than one parameter at a time.

U.S. Pat. No. 6,470,229 B1 assigned to the same assignee as the presentapplication discloses a yield management system and technique forprocessing a yield data set containing one or more prediction variablevalues and one or more response variable values to remove predictionvariables with missing values and data sets with missing values. Theprocessed data can then be used to generate a yield model preferably inthe form of a decision tree. The system can also accept user input tomodify the generated model.

While the yield management system and technique disclosed inaforementioned U.S. Pat. No. 6,470,229 B1 provide a powerful yieldmanagement tool, one limitation is that the criteria employed forprocessing data sets may remove data sets with missing values, eventhough the data sets may contain usable data respecting a significantprediction variable that may be useful in generating the model. Also,while the disclosed system and technique provide fundamental splittingrules for generating a decision-tree based model, there are instances inwhich the system is limited in the variety of splitting rules and alsolimited in accommodating modification of the model based on theknowledge of the user.

Thus, it would be desirable to provide a yield management system andmethod which overcome the above limitations and disadvantages ofconventional systems and facilitate building a more accurate model. Itis to this end that the present invention is directed. The variousembodiments of the present invention provide many advantages overconventional methods and yield management systems.

SUMMARY OF THE INVENTION

One embodiment of the yield management system and method in accordancewith the present invention provides many advantages over conventionalyield management systems and techniques, which make the yield managementsystem and method in accordance with the present invention more usefulto semiconductor manufacturers. The system may be fully automated and iseasy to use, so that no extra training is necessary to make use of theyield management system. In addition, the yield management systemhandles both continuous and categorical variables. The system alsoautomatically handles missing data during a processing step that isoptimized to consider data for all significant yield parameters. Thesystem can rapidly search through hundreds of yield parameters andgenerate an output indicating the one or more key yieldfactors/parameters. The system generates an output preferably in theform of a decision tree that is easy to interpret and understand. Thesystem may employ advanced splitting rules to parse the data and is alsovery flexible in that it permits prior yield parameter knowledge fromone or more users to be easily incorporated into the building of themodel. Unlike conventional yield management systems, if there is morethan one yield factor/parameter affecting the yield of the process, thesystem can identify all of the parameters/factors simultaneously, sothat the multiple factors/parameters are identified during a single passthrough the yield data.

In accordance with various embodiments of the present invention, theyield management system and method may receive a yield data set. When aninput data set is received, one embodiment of the yield managementsystem and method in accordance with the present invention firstperforms a data processing step in which the validity of the data in thedata set is checked, and cases or parameters with missing data areidentified. One embodiment of the semiconductor yield management systemand method in accordance with the present invention provides a tieredsplitting method to maximize usage of all valid data points. Anotherembodiment of the yield management system and method in accordance withthe present invention provides an outlier filtering method. Also, inaccordance with various other embodiments of the yield management systemand method of the present invention, a user can select from among 1) addtool usage parameters, 2) treat an integer as categorical, and 3)auto-categorize methods for better data manipulation capability andflexibility.

The semiconductor yield management system and method in accordance withone embodiment of the present invention also preferably provide a lineartype split and a range type split for use in constructing the model whenthe response variable and the prediction variable have a linearrelationship, in order to overcome the shortcoming of a binary decisiontree that has to split on the prediction variable several times ondifferent levels and does not necessarily show that the relationship islinear. The semiconductor yield management system and method inaccordance with various embodiments of the present invention alsoprovide user control in formulating the rules for splitting nodes, sothat the user may assure that more appropriate and accurate models aregenerated. Preferably, the user selectable split methods include: 1)consider tool and date parameters jointly; 2) consider tool and eventparameters jointly; 3) maximize class distinction; 4) prefer simplesplits; 5) minimum purity; 6) parameter weighting; 7) minimum groupsize; 8) maximum number of descendants; and 9) raw data mapping.

Additionally, if the prediction variable is categorical, one embodimentof the yield management system and method in accordance with the presentinvention enables the user to select any combination of classes of thevariable and include them in one sub-node of the decision tree. Theremainder of the data is included in the other sub-node. On the otherhand, if the prediction variable is continuous, there are preferablythree types of split formats from which the user may select. Theavailable split formats are 1) a default type (a≦X), 2) a range type(a1≦X<a2), and 3) a linear type (X<a1, X in [a1, a2], X in [a2, a3],X>a3). These different split formats facilitate the user being able toproduce an accurate model.

Using the cleaned-up data set, a yield mine model is built during amodel building step. Once the model is generated automatically by theyield management system and method in accordance with the presentinvention, the model may be further modified by one or more users basedon their experience or prior knowledge of the data set.

The yield management system and method in accordance with one embodimentof the present invention also preferably enable the user to select amethod to generate multiple models simultaneously, so that the user maychoose a group of parameters for the model building. The yieldmanagement system and method in accordance with the present inventionthen generate a model for each of the parameters selected by the user.

Another embodiment of the yield management system and method inaccordance with the present invention additionally enables the user toinvoke a method to redisplay the setup window and quickly modify his orher previous selections, so that the model may be adjusted. Finally, theyield management system and method in accordance with another embodimentof the present invention enable the user to invoke methods tocollapse/expand a node to collapse the node when the user decides thatthe split of the node is unnecessary or, alternatively, to expand thenode when the user wants to examine the aggregate statistics of theentire subset. The method to expand a node may also be invoked by theuser to expand a previously collapsed node, so that the node returns toits original length.

After the model has been modified, the data set may be processed usingvarious statistical analysis tools to help the user better understandthe relationship between the prediction and response variables. Theyield management system and method in accordance with the presentinvention provide a yield management tool that is much more powerful andflexible than conventional tools.

The foregoing and other objects, features, and advantages of the presentinvention will become more readily apparent from the following detaileddescription of various embodiments, which proceeds with reference to theaccompanying drawing.

BRIEF DESCRIPTION OF THE DRAWING

The various embodiments of the present invention will be described inconjunction with the accompanying figures of the drawing to facilitatean understanding of the present invention. In the figures, likereference numerals refer to like elements. In the drawing:

FIG. 1 is a block diagram illustrating an example of a yield managementsystem in accordance with one embodiment of the present inventionimplemented on a personal computer;

FIG. 2 is a block diagram illustrating more details of the yieldmanagement system in accordance with the embodiment of the presentinvention shown in FIG. 1;

FIG. 3 is a flowchart illustrating an example of a yield managementmethod in accordance with one embodiment of the present invention;

FIG. 4 is a diagram illustrating a known data processing procedure;

FIG. 5 is a diagram illustrating a tiered splitting data processingprocedure in accordance with one embodiment of the method of the presentinvention;

FIG. 6 illustrates an initial display screen displayed by the yieldmanagement system shown in FIG. 1;

FIG. 7 illustrates a drop-down menu that appears when a user positions amouse pointer on “Analysis” which appears in the menu bar of the displayscreen shown in FIG. 6 and clicks the left mouse button and thenpositions the mouse pointer on “Yield Mine” in the drop-down menu;

FIG. 8 illustrates a setup display screen which appears when the userpositions the mouse pointer on “Setup” in the drop-down menu illustratedin FIG. 7 and clicks the left mouse button;

FIG. 9 illustrates a scroll-down list that enables a user to select amethod to have the yield management system of the present inventionfilter outliers;

FIG. 10 is a diagram illustrating a data processing procedure to addtool usage parameters in accordance with one embodiment of the method ofthe present invention;

FIG. 11 illustrates an example of a yield parameter being selected bythe user and a decision tree node being automatically split or manuallysplit in accordance with one embodiment of the method of the presentinvention;

FIG. 12 is a flowchart illustrating a recursive node splitting method inaccordance with one embodiment of the method of the present invention;

FIG. 13 illustrates an example of a yield parameter being selected bythe user and a decision tree node being built based on a joint typesplit at a top level and based on a linear type split at a bottom levelin accordance with various embodiments of the method of the presentinvention;

FIG. 14 illustrates an example of a yield parameter being selected bythe user and a decision tree node being built based on a range typesplit in accordance with another embodiment of the method of the presentinvention;

FIG. 15 illustrates a window provided for the user to weight parametersin accordance with another embodiment of the method of the presentinvention;

FIG. 16 illustrates an example of a yield parameter being selected bythe user and a decision tree node being built based on a binary split inaccordance with one embodiment of the method of the present invention;

FIG. 17 illustrates a pop-up menu that appears when the user positions amouse pointer on a yield parameter and clicks the left mouse button toinvoke a new cut rule method in accordance with another embodiment ofthe method of the present invention;

FIG. 18 illustrates a window that appears when the user positions amouse pointer on “New Cut-Point” shown in FIG. 17 and clicks the leftmouse button to select a split format for a continuous predictionvariable;

FIG. 19 illustrates a window that appears when the user positions amouse pointer on “New Cut-Point” shown in FIG. 17 and clicks the leftmouse button to select a combination of classes for a categoricalprediction variable;

FIG. 20 illustrates a window that appears when the user positions amouse pointer on “New Split Rule” shown in FIG. 17 and clicks the leftmouse button to display the split rules for the top N scored parameters;

FIG. 21 illustrates a window to select a number, N, of the split rulesfor the top N scored parameters shown in FIG. 20;

FIG. 22 illustrates an example of various yield parameters beingselected by the user and decision trees being built to generate multiplemodels simultaneously in accordance with one embodiment of the method ofthe present invention;

FIG. 23 illustrates a pop-up menu which enables the user to elect tomodify setup selections in accordance with one embodiment of the methodof the present invention;

FIG. 24 illustrates a pop-up menu which toggles to “Expand Sub-Nodes”when the “Collapse Sub-Nodes” method is invoked by the user inaccordance with another embodiment of the present invention; and

FIG. 25 illustrates an example of statistical tools available to theuser in accordance with one embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention is particularly applicable to acomputer-implemented software-based yield management system, and it isin this context that the various embodiments of the present inventionwill be described. It will be appreciated, however, that the yieldmanagement system and method in accordance with the present inventionhave greater utility, since they may be implemented in hardware or mayincorporate other modules or functionality not described herein.

FIG. 1 is a block diagram illustrating an example of a yield managementsystem 10 in accordance with one embodiment of the present inventionimplemented on a personal computer 12. In particular, the personalcomputer 12 may include a display unit 14, which may be a cathode raytube (CRT), a liquid crystal display, or the like; a processing unit 16;and one or more input/output devices 18 that permit a user to interactwith the software application being executed by the personal computer.In the illustrated example, the input/output devices 18 may include akeyboard 20 and a mouse 22, but may also include other peripheraldevices, such as printers, scanners, and the like. The processing unit16 may further include a central processing unit (CPU) 24, a persistentstorage device 26, such as a hard disk, a tape drive, an optical disksystem, a removable disk system, or the like, and a memory 28. The CPU24 may control the persistent storage device 26 and memory 28.Typically, a software application may be permanently stored in thepersistent storage device 26 and then may be loaded into the memory 28when the software application is to be executed by the CPU 24. In theexample shown, the memory 28 may contain a yield manager 30. The yieldmanager 30 may be implemented as one or more software applications thatare executed by the CPU 24.

In accordance with the present invention, the yield management system 10may also be implemented using hardware and may be implemented ondifferent types of computer systems, such as client/server systems, Webservers, mainframe computers, workstations, and the like. Now, moredetails of an exemplary implementation of the yield management system 10in software will be described.

FIG. 2 is block diagram illustrating more details of the yield manager30 in accordance with one embodiment of the present invention. Inparticular, the yield manager 30 may receive a data set containingvarious types of semiconductor process data, includingcontinuous/numerical data, such as temperature or pressure, andcategorical data, such as the lot number of the particular semiconductordevice or integrated circuit. The yield manager 30 may process the dataset, generate a model, apply one or more statistical tools to the modeland data set, and generate an output that may indicate, for example, thekey factors/parameters that affected the yield of the devices thatgenerated the current data set.

Considered in more detail, as shown in FIG. 2, the data set may be inputto a data processor 32 that may optimize and validate the data andremove incomplete data records. The output from the data processor 32may be fed into a model builder 34, so that a model of the data set maybe automatically generated by the yield manager 30. Once the modelbuilder 34 has generated a model, the user may preferably enter modelmodifications into the model builder to modify the model based on, forexample, past experience with the particular data set. Once any usermodifications have been incorporated into the model, a final model isoutput and is preferably made available to a statistical tool library36. The library 36 may contain one or more different statistical toolsthat may be used to analyze the final model. The output of the yieldmanager 30 may be, for example, a listing of one or morefactors/parameters that contributed to the yield of the devices thatgenerated the data set being analyzed. As described above, the yieldmanager 30 is able to simultaneously identify multiple yield factors.Now, a yield management method in accordance with one embodiment of thepresent invention will be described.

By way of background, data preparation is always an important aspect ofany yield management system. Sometimes, 90% of the time is spent oncleaning up and making the data suitable for analysis.

Data collected from semiconductor metrology tools often contain missingdata and outliers, which cause problems for analysis. In order to dealwith these problems, once the user obtains data from the metrology tool,a preferred embodiment of the semiconductor yield management system andmethod in accordance with the present invention may maximize the usageof all valid data points for key yield factors/parameters. The preferredembodiment also preferably provides a range of methods for filtering outoutliers. These methods will now be described in detail.

FIG. 3 is a flowchart illustrating an example of a yield managementmethod 40 in accordance one embodiment of the present invention. Themethod may include receiving an input data set, as indicated by a step41 shown in FIG. 3, and processing the input data set, as indicated by astep 42 shown in FIG. 3, to clean up the data set (e.g., optimize datausage and validate the data and remove data records containing missing,erroneous, insignificant, or invalid data elements).

As indicated by a step 44 shown in FIG. 3, the cleaned-up data set maybe used to build one or more models; and the user may enter modelmodifications, as indicated by a step 46 shown in FIG. 3. Once the modelis complete, it may be analyzed, as indicated by a step 48 shown in FIG.3, using a variety of different statistical tools to generate yieldmanagement information, such as key yield factors. Each of the abovesteps will now be described in more detail to provide a betterunderstanding of the method in accordance with the various embodimentsof the present invention. In particular, the data processing step 42 inaccordance with the method of the present invention will now bedescribed.

The data processing step 42 shown in FIG. 3 helps to clean up theincoming data set so that the later analysis may be more fruitful. Theyield management system 10 shown in FIG. 1 can handle data sets withcomplicated data structures. A yield data set typically has hundreds ofdifferent variables. These variables may include both a responsevariable, Y, and prediction variables, X₁, X₂, . . . , X_(m), that maybe of a numerical type or a categorical type. On the one hand, avariable is a numerical type variable if its values are real numbers,such as different temperatures at different times during thesemiconductor fabrication process. On the other hand, a variable is acategorical type variable if its values are of a set of finite elementsnot necessarily having any natural ordering. For example, a categoricalvariable may take values in a set of {MachineA, MachineB, or MachineC}or values of (Lot1, Lot2, or Lot3).

It is common for a yield data set to have missing values. U.S. Pat. No.6,470,229 B1 discloses a data processing step that preferably removesthe cases or variables having missing values. In particular, theprocessing may initially remove all prediction variables that are “bad”.By “bad”, it is understood that either a variable has too much missingdata, ≧MS, or, for a categorical variable, if the variable has too manydistinct classes, ≧DC. Aforementioned U.S. Pat. No. 6,470,229 B1discloses that both MS and DC may be user-defined thresholds, so thatthe user may set these values and control the processing of the dataset. For example, the default values may be MS=0.05×N, DC=32, where N isthe total number of cases in the data set.

U.S. Pat. No. 6,470,229 B1 discloses that once the “bad” predictionvariables are removed, then, for the remaining data set, data processingmay remove all cases with missing data. If one imagines that theoriginal data set is a matrix with each column representing a singlevariable, then data processing first removes all “bad” columns(variables) and then removes “bad” rows (missing data) in the remainingdata set with the “good” columns.

FIG. 4 is a diagram illustrating an example of the data processingtechnique disclosed in U.S. Pat. No. 6,470,229 B1. In particular, forthis example, the MS variable is set to 2. FIG. 4 shows an original dataset 50, a data set 52 once “bad” columns have been removed, and a dataset 54 once “bad” rows have been removed. As shown, the original dataset 50 may include three prediction variables (PRED1, PRED2, and PRED3)and a numerical response variable (RESPONSE) in which three values forPRED3 are unknown and one value for PRED2 is unknown. Since the MS valueis set to 2 in this example, any prediction variable that has more thantwo unknown values is removed. Thus, as shown in the processed data set52, the column containing the PRED3 variable is removed from the dataset. Since the PRED2 variable has only one missing value, it is notremoved from the data set in this step. Next, any “bad” rows of data areremoved from the data set. In the example shown in FIG. 4, the row witha PRED1 value of 0.5 is removed, because the row contains an unknownvalue for variable PRED2. Thus, once the processing has been completed,the data set 54 contains no missing values.

In practice, however, data sets for semiconductor fabrication processesemployed in the semiconductor manufacturing industry typically containmissing data. It is extremely inefficient if all the cases with missingdata are discarded. For example, assume a data set with 500 parametersand 1.0% of the data points are missing. This is not uncommon in thesemiconductor industry. If the 1.0% missing data are randomlydistributed, the probability of obtaining a complete observation withouta single missing measurement is about 0.65%. This means more than 99% ofthe cases contain missing measurements.

To solve this problem, the yield management system and method inaccordance with a preferred embodiment of the present invention providea tiered splitting method. The tiered splitting method takes advantageof the fact that a split rule of a decision tree typically only involvesa few parameters (most likely just one parameter) at a time. The tieredsplitting method in accordance with a preferred embodiment of thepresent invention operates as follows.

In accordance with the tiered splitting method of the present invention,at the top node, for each parameter combination (P₁, P₂, . . . , P_(m))that is a candidate set for a split rule, only cases in this particularset having missing values for a selected parameter are removed by afirst processing step. Typically, m≦2; therefore, most cases arepreserved after the top node split. The same tiered splitting method mayalso be used in subsequent splits.

FIG. 5 shows an example of how missing data are preferably treated usingdata processing disclosed in U.S. Pat. No. 6,470,229 B1 with the MSvalue set to 6 and in accordance with data processing employing tieredsplitting in accordance with the method of the present invention. Asshown in FIG. 5, the original data set 56 contains three predictionvariables (P₁, P₂, and P₃) and one response variable (Response) and 13cases.

With the MS value set to 6, no “bad” columns appear in FIG. 5, becausethe parameter having the most missing values is P₂, which has only 5missing values. However, cases 2, 4, 5, 7, 9, 10, 11, 12, and 13 are“bad” rows because they have unknown values and are consequently removedduring processing by the technique disclosed in U.S. Pat. No. 6,470,229B1. Hence, cleaned-up data 58 using processing in accordance with U.S.Pat. No. 6,470,229 B1 preferably contains no unknown values.

The tiered splitting method in accordance with the present invention isbased on designating a candidate parameter for a split rule at the timeof processing and a value for that parameter during processing. Forexample, as shown in FIG. 5, the candidate parameter is P₁, and thevalue for P₁ used during processing is “1”. The cleaned-up data 62Acontains all cases having a value of “1” for P₁. In contrast, casesmissing a value for P₁ are removed, and all cases having a value for P₁other than “1” are also removed. The removed data 62B are also shown inFIG. 5.

The advantage of tiered splitting can be shown by then applying a splitrule P₃=1 to the cleaned-up data. On the one hand, applying this splitrule (P₃=1) to the data set produced by the processing techniquedisclosed in U.S. Pat. No. 6,470,229 B1 results in a model 60 shown inFIG. 5. On the other hand, applying the split rule (P₃=1) to thecleaned-up data produced by the tiered splitting method in accordancewith the present invention results in a model 64 and contains anadditional set of data. Thus, in comparison, the technique disclosed inU.S. Pat. No. 6,470,229 B1 losses more information in building the modeland may be less accurate.

Additionally, outliers are common in semiconductor fabrication processdata sets. Outliers are data that do not lie within a normal statisticaldistribution. They are caused by a variety of factors as simple asmistypes. Because of the extreme values of outliers, a model generatedfrom the data set may be distorted and misleading. In many cases, theuser is aware of the existence of outliers and would like to remove themfrom consideration. The preferred embodiment of the semiconductor yieldmanagement system and method in accordance with the present inventionprovides an easy to use method, preferably available as an option forselection by a user, to filter out the outliers automatically.

FIG. 6 illustrates an initial display screen displayed by the yieldmanagement system 10 shown in FIG. 1. FIG. 7 illustrates a drop-downmenu that appears when a user positions the mouse pointer on “Analysis”in the menu bar that appears on the display screen shown in FIG. 6 andclicks the left mouse button. FIG. 8 illustrates a setup display screenwhich appears when the user positions the mouse pointer on “Setup” inthe drop-down menu illustrated in FIG. 7 and clicks the left mousebutton.

As shown in FIG. 8, in the setup screen for the yield management system10 (FIG. 1), the user may invoke the outlier filtering method inaccordance with the present invention by positioning the mouse pointeron an “Outlier Filtering” box 70 and clicking the left mouse button.Preferably, the following three outlier filtering options are availableto the user and appear in a drop-down list, as shown in FIG. 9:

-   -   1) None—No outlier filtering is performed. This is preferably        the default and is the option initially displayed by the setup        screen shown in FIG. 8.    -   2) Mean±N*std—In this case, the user also has the option to        select a threshold value N, which the user enters in a data        entry box 72 shown in FIG. 8 by positioning the mouse pointer on        the up/down arrows and clicking the left mouse button, or by        entering a value in the box using the numerical keys on the        keyboard 20. The variable mean and standard deviation are        preferably calculated according to the following formulae:        $\begin{matrix}        {{Mean} = {\sum\limits_{i = 1}^{n}\quad{x_{i}/n}}} & \left( {{Equation}\quad 1} \right) \\        {{std} = {\sqrt{\sum\limits_{i = 1}^{n}\quad\left( {x_{i} - {Mean}} \right)^{2}}/\left( {n - 1} \right)}} & \left( {{Equation}\quad 2} \right)        \end{matrix}$    -    The yield management system 10 removes cases outside the range        of Mean±N*std.    -   3) Median±N*MAD—This is similar to the previous option, except        the standard deviation is replaced by MAD, which is preferably        calculated according to the following formula: $\begin{matrix}        {{MAD} = {\sum\limits_{i = 1}^{n}\quad{{{x_{i} - {Mean}}}/n}}} & \left( {{Equation}\quad 3} \right)        \end{matrix}$

Also, in accordance with a preferred embodiment of the yield managementmethod of the present invention, the user may select from among methodsto add tool usage parameters, treat an integer as a categoricalvariable, and auto-categorization of data for better data manipulationcapability and flexibility in connection with processing data sets atstep 42 shown in FIG. 3. These methods will now be described in detail.

A first method, preferably available as an option for selection by theuser, is to add tool usage parameters. The semiconductor device orintegrated circuit manufacturing process may be extremely complex. It isquite common that a wafer has to pass more than 100 process steps. Amongthese steps, the same tool, for example, an etcher, may be used multipletimes at different process steps for the same lot. This multiple usagemagnifies the impact of the tool on the final yield. Based on thisconsideration, it may be desirable to construct parameters based on thenumber of times that a tool is used.

As shown in FIG. 8, one embodiment of the semiconductor yield managementsystem and method in accordance with the present invention may enablethe user to select construction of tool usage parameters in its setup.In particular, when the user places the mouse pointer on an “Add ToolUsage Parameters” button 74 shown in FIG. 8 and clicks the left mousebutton, the yield management system 10 automatically processes the dataset to identify the number of times that each tool is used during thesemiconductor fabrication process. The tool usage parameter is a numberthat equals the number of times that a particular tool is used in eachcase contained in the data set for the semiconductor fabrication processunder analysis. For example, a data set 76 shown in FIG. 10 containsthree parameters P₁, P₂, and P₃, one response variable (Response), andfour cases. The values for the prediction variables are either “Etch1”or “Etch2” relating to the use of two etchers. When the user clicks onthe “Add Tool Usage Parameters” button 74 shown in FIG. 8, the yieldmanagement system 10 determines the number of times each tool (“Etch1”and/or “Etch2”) were used for each case, and tabulates the tool usagedata during processing to produce a new data set 78 shown in FIG. 10. Asshown, the data set 78 contains two new columns that specify the numberof times each of the tools “Etch1” and “Etch2” were used in connectionwith each case. In particular, for the first case, tool “Etch1”generated the values of all three parameters P₁, P₂, and P₃ (i.e., thenew “Etch1” parameter in the data set 78 is “3”), and tool “Etch2” wasnot used (i.e., the new “Etch2” parameter in the data set 78 is “0”).The newly constructed parameters may aid in identifying tool problems.

As shown in FIG. 8, a second method, preferably available as an optionfor selection by the user, is to treat an integer as a categoricalvariable. Parameters with integer values are quite common in asemiconductor fabrication process data set. Occasionally, an integer issimply a class name and does not imply a relative scale between itsvalue and other integer values. In this case, the variable may beappropriately treated as a categorical variable, instead of a continuousvariable.

One embodiment of the semiconductor yield management system and methodin accordance with the present invention provides an option in its setupselectable by the user to treat an integer as a categorical variable. Inparticular, as shown in FIG. 8, the user first highlights the selectedvariable in a scroll-down list 80 by positioning the mouse pointer onthe listed variable and clicking the left mouse button. The user thenpositions the mouse pointer on a “Treat Integer as Categorical” checkbox 82 and clicks the left mouse button to designate the highlightedvariable as a categorical variable. When the treat integer ascategorical method is invoked by the user, the semiconductor yieldmanagement system and method in accordance with the present inventionhandle the variable as a categorical variable, rather than a continuousvariable.

A third method, preferably available as an option for selection by theuser, is auto-categorization. The distribution of a variable in asemiconductor fabrication process data set is typically not uniform orGaussian. Occasionally, the distribution exhibits multiple local maxima.In this case, the user may want to bin the data into classes. This typeof data manipulation is preferably made automatic in the semiconductoryield management system and method in accordance with the presentinvention.

In accordance with the semiconductor yield management system and methodof the present invention, the user positions the mouse pointer on an“Auto-Categorize” button 84 shown in FIG. 8 and clicks the left mousebutton to invoke the auto-categorization method. Within this option, theuser may decide the number of categories in a data entry box 86 bypositioning the mouse pointer on the up/down arrows and clicking theleft mouse button, or by entering a value in the box using the numericalkeys on the keyboard 20. The user also chooses if small clusters are tobe excluded by positioning the mouse pointer on a check box 88 andclicking the left mouse button. The user may also select appropriatetreatment for outliers using the outlier filtering method describedearlier. A preview check box 90, which the user may select bypositioning the mouse pointer on the box and clicking the left mousebutton, is preferably provided to enable the user to view the resultsand make appropriate adjustments. Data clustering is preferablyperformed using the nearest neighbor method well-known to personsskilled in the art. In one implementation, the user can select up to 12bins. The result of auto-categorization is the creation of a newcategorical variable. This variable serves as the new response variablefor the yield management system and method in accordance with thepresent invention.

Once processing of the input data set is complete, the yield managementsystem and method in accordance with the present invention build theyield model. Now, the model building step 44 shown in FIG. 3 will bedescribed in more detail.

The yield management system 10 in accordance with the variousembodiments of the present invention preferably uses adecision-tree-based method to build a yield model. In particular, themethod partitions a data set, D, into sub-regions. The decision treestructure may be a hierarchical way to describe a partition of D. It isconstructed by successively splitting nodes (as described below),starting with the root node (D), until some stopping criteria are metand the node is declared a terminal node. For each terminal node, avalue or a class is assigned to all the cases within the node. Now, thenode splitting method in accordance with various embodiments of thepresent invention and examples of decision trees will be described inmore detail.

In general, FIG. 11 shows an example of a yield mine model decision tree100 that may be generated by the yield management system 10 (FIG. 1). Inthis example, the data set contains 233 process step variables, 233 timevariables corresponding to each process step, and 308 parametric testvariables. However, only a portion of the variables is shown in FIG. 11for clarity. All of these variables are used in the yield mine modelbuilding as prediction variables. The response variable in this exampleis named “GOOD_DIEIRING” and represents the number of good dies aroundthe edge of a wafer produced during a particular semiconductorfabrication process run.

In this example, out of all 774 prediction variables, the yield minemodel using decision tree prediction identifies one or more variables askey yield factors. In the example, the key yield factor variables arePWELLASH, FINISFI, TI_TIN_RTP_ (hidden by the overlying window), andVTPSP_. In this example, PWELLASH and FINISFI are time variablesassociated with the process variables PWELLASH_ and FINISFI_, andTI_TIN_RTP_ and VTPSP_ are process variables. Note that, for eachterminal node 102 in the decision tree, the numerical value of theresponse variable at that terminal node is shown, so that the user canview the tree and easily determine which terminal node (and thus whichprediction variables) result in the best value of the response variable.

In the decision tree structure model shown in FIG. 11, if a tree node isnot terminal, it has a splitting criterion for the construction of itssub-nodes, as will be described in more detail below with reference toFIG. 12. For example, the root node is split into two sub-nodesdepending on the criterion of whether PWELLASH is before or after3:41:00 AM, 07/03/1998. If PWELLASH is before 3:41:00 AM, 07/03/1998,the case is put in the left sub-node. Otherwise, it is put in the rightsub-node. The left sub-node is further split into its sub-nodes usingthe criterion FINISFI<07/17/1998 4:40:00 PM. The right sub-node is alsofurther split into its sub-nodes using the criterionTI_TIN_RTP_(—)=2RTP, where TI_TIN_RTP_ is a process step parameter and2RTP is one of its specifications if the variable is continuous. For aterminal node, the average value of all cases under the node is shown.In this example, it is relatively clear to the user that whenPWELLASH<07/03/1998 3:41:00 AM, the yield is higher, especially when thecriterion FINISFI<07/17/1998 4:40:00 PM is also satisfied. The worstcase occurs when PWELLASH≧07/03/1998 3:41:00 AM, TI_TIN_RTP_(—)<2RTP,and VTPSP_(—)ε {23STEPS, 25STEPS, 26STEPS}.

To find the proper stopping criteria for decision tree construction is adifficult problem. In order to deal with the problem, one may firstover-grow the tree and then apply cross-validation techniques to prunethe tree, as described in aforementioned U.S. Pat. No. 6,470,229 B1, thedisclosure of which is hereby incorporated herein in its entirety bythis reference. To grow an oversized tree, the method may keep splittingnodes in the tree until all cases in the node have the same responsevalue, or the number of cases in the node is less than a user definedthreshold, n₀. The default is preferably n₀=max {5,floor(0.02×N)}, whereN is the total number of cases in D, and the function floor(x) gives thebiggest integer that is less than or equal to x. Now, the constructionof the decision tree and the method for splitting tree nodes inaccordance with various embodiments of the present invention will bedescribed.

FIG. 12 is a flowchart illustrating a method 110 for splitting nodes ofa decision tree. As indicated by a step 112 shown in FIG. 12, aparticular node of a decision tree, T, is selected. The process is thenrepeated for each node of the tree.

As indicated by a step 114 shown in FIG. 12, the method may determine ifthe number of data values in node T is less than a predeterminedthreshold, N. If the number of data values is less than N, then thesplitting for the particular node is stopped, as indicated by a step 116shown in FIG. 12, and the next node may be processed.

If the number of data values for the node is not less than N, then, asindicated by a step 118 shown in FIG. 12, the processing of theparticular node is continued. In particular, for each predictionvariable, i, where i=1, . . . , n, the “goodness” of the split value,Φ_(i), is calculated. Then, as indicated by a step 120 shown in FIG. 12,the prediction variables, j, are selected such that Φ_(j)=MAX{Φ_(i)|i=1, . . . , n|}. As indicated by a step 122 shown in FIG. 12,the method may determine if Φ_(j)>V, where V is a user-defined thresholdvalue as described below. If Φ_(j) is not greater than the thresholdvalue, then as indicated by a step 124 shown in FIG. 12, the splittingprocess for the particular node is stopped, and the processing continueswith the next node.

If Φ_(j)>V, then as indicated by a step 126 shown in FIG. 12, the node,T, is split into one or more sub-nodes, T₁, T₂, . . . , T_(m), based onthe variable j. As indicated by a step 128 shown in FIG. 12, for eachsub-node, T_(k), where k=1, . . . , m, the same node splitting techniqueis applied. In this manner, each node is processed to determine ifsplitting is appropriate, and then each sub-node created during a splitis also checked for susceptibility to splitting, as well. Thus, thenodes of the decision tree are split. Now, more details of the decisiontree construction and node splitting method will be described.

A decision tree is built to find relations between the response variableand the prediction variables. Each split, S, of a node, T, partitionsthe node into m sub-nodes T₁, T₂, . . . , T_(m), in hopes that thesub-nodes are less “noisy” than T, as defined below. To quantify thismethod, a real-value function that measures the noisiness of a node T,g(T), may be defined wherein N^(T) denotes the number of cases in T, andN^(Ti) denotes the number of cases in the ith sub-node T_(i). Thepartition of T is exclusive; therefore,${\sum\limits_{i = 1}^{m}\quad N^{{Ti}_{i}}} = {N^{T}.}$Next, one may define Φ(S) to be the goodness of split function for asplit, S, wherein: $\begin{matrix}{{\Phi(S)} = {{g(T)} - {\frac{1}{N^{T}}{\sum\limits_{i = 1}^{m}\quad{N^{{Ti}_{g}}({Ti})}}}}} & \left( {{Equation}\quad 4} \right)\end{matrix}$We say that the sub-nodes are less noisy than their ancestor if Φ(S)>0.A node split may depend only on one prediction variable. The method maysearch through all prediction variables, X₁, X₂, . . . , X_(n), one byone to find the best split based on each prediction variable. Then, thebest split is the one that minimizes Φ(S) and is preferably used tosplit the node. Generally, it is sufficient to explain the method bydescribing how to find the best split for a single prediction variable.Depending on the types of the response variable, Y, and the predictionvariable, X, as being either categorical or numerical, there are fourpossible scenarios, as described in U.S. Pat. No. 6,470,229 B1. Thatpatent describes in detail for each scenario how the split isconstructed and how to assign a proper value or a class to a terminalnode.

As described above, the most common form of split in the decision treeis a binary split. The binary split partitions the data into twosubsets. This type of split is easy to understand and can be easilyillustrated in a decision tree diagram, as described earlier. Thedrawback is that a binary split may be too restrictive and may not beable to show certain common types of relationship between the responsevariable and the prediction variable.

For example, when the response variable and the prediction variable havea linear relationship, a binary decision tree will have to split on theprediction variable several times on different levels. Unfortunately,the binary split does not necessarily show that the relationship islinear. In order to deal with this type of problem, various embodimentsof the semiconductor yield management system and method in accordancewith the present invention provide a linear type split method and arange type split method for use in constructing the model. These typesof splits will now be described in detail.

The linear split method in accordance with one embodiment of the presentinvention operates as follows. When both the response variable, Y, andthe prediction variable, X, are continuous, a linear relationshipbetween Y and X is common. A typical binary split, of the type X>a,simply divides the prediction variable into two subsets and onlyindicates that the two subsets {X>a} and {X≦a} are different. Such abinary split does not necessarily mean that the relationship is linear.To explicitly show a continuous linear relationship, one embodiment ofthe semiconductor yield management system and method in accordance withthe present invention employs a linear split rule.

When the yield management system 10 shown in FIG. 1 identifies therelationship between X and Y as linear, linear splits are preferablyused. Instead of partitioning the data into two subsets, the decisiontree uses M (preferably having a default value of 4) sub-nodes toindicate the linear relationship. FIG. 13 shows an example of a linearsplit 92 at the bottom split level for a continuous variableDATE@PROCESS STEP=062, in which M equals the default value of 4. Since acontinuous linear split appears different from the binary split shown inFIG. 11, the user can immediately identify the relationship as linear.At the same time, the linear split rule is the result of a fittedregression line. By generating M sub-nodes simultaneously, the yieldmanagement system and method in accordance with the present inventioneliminate the need for splitting on X repeatedly, as would be the resultin the case of a multiple level decision tree employing pluralinstantiations of a binary split rule.

The range split method in accordance with one embodiment of the presentinvention operates as follows. It is quite common for the optimal valueof a parameter to produce the best yield results in the middle of itsrange. A deviation from the optimal value in a positive or negative waytypically causes yield loss. This type of situation may be best modeledusing a split rule of the form a1≦X<a2, referred to as a range typesplit.

FIG. 14 shows an example of applying a range split rule. In thesituation in which the best results are obtained when a parameter is inthe middle of its range, the range, split rule generates a more accuratemodel than a traditional decision tree binary split rule of the formX<a. In the example shown in FIG. 14, the split rule for the continuousvariable ETEST52 is 0.8789≦ETEST52<1.0292 and generates a range split94. At the same time, by spanning the two extremes of the range of thevariable, the range split rule enhances the significance of the variableand makes its impact easier to discern.

Various embodiments of the semiconductor yield management system andmethod in accordance with the present invention preferably provide aplurality of additional methods to facilitate node splitting forconstruction of the decision tree. By way of background, semiconductorprocess data sets may vary substantially from one to another. A givenparameter, which the user is attempting to use as a prediction variableto construct the decision tree, may exhibit different values among datasets. At the heart of the model building is the split rule whichpartitions a node into sub-nodes. By controlling the way split rules areformulated, the user may assure that more appropriate and accuratemodels are generated.

The semiconductor yield management system and method in accordance withthe present invention preferably provide user control in formulating therules for splitting nodes, including the following split rulemethods: 1) consider tool and date parameters jointly; 2) consider tooland event parameters jointly; 3) maximize class distinction; 4) prefersimple splits; 5) minimum purity; 6) parameter weighting; 7) minimumgroup size; 8) maximum number of descendants; and 9) raw data mapping.These user selectable controls for formulating split rules are powerfultools in practice. They will now be described in detail. Now, the methodfor considering tool and date parameters jointly for splits inaccordance with one embodiment of the present invention will bedescribed.

Many data sets contain data respecting process tool designations ascategorical values, as well as the times when the tools are used ascontinuous values. A common cause for yield problems may be associatedwith the use of a single tool. For example, the tool may be in properoperating condition at the beginning of a period during which data iscollected. However, after a certain date during the period, a change inthe tool operation causes the yield to drop. An accurate model todescribe the above case involves splitting on both the tool and dateparameters. However, for speed and practical considerations, most splitsin conventional semiconductor yield management systems consider only oneparameter at a time. To solve this problem, the semiconductor yieldmanagement system and method in accordance with one embodiment of thepresent invention provide a method, preferably available as an optionfor selection by the user, to consider tool and date parameters jointlyfor splits. This type of decision tree structure requires thesemiconductor yield management system and method to look ahead one levelwhen they are considering the split on the tool parameter.

To select the method for considering tool and date parameters jointlyfor splits, the user positions the mouse pointer on a “Consider tool anddate jointly for splits” check box 200 shown in FIG. 8 and clicks theleft mouse button. When the consider tool and date jointly for splitsmethod is invoked, the semiconductor yield management system and methodin accordance with one embodiment of the present invention not onlyconsider each parameter, but also the tool parameter and itscorresponding date parameter together as a split candidate. Because ajoint split involves two parameters, the relative score for the jointsplit versus other splits is adjusted by a threshold.

FIG. 13 shows an example of a joint split 202, as indicated by the“YES-AND” connector between the top and intermediate split levels of thedecision tree. When a joint split rule is being employed, the jointsplit rule may also be color-coded. For example, a green color for thesplit rule may indicate the split is a joint split. Now, the method forconsidering tool and event parameters jointly for splits in accordancewith one embodiment of the present invention will be described.

In accordance with one embodiment of the semiconductor yield managementsystem and method of the present invention, the user may select ascenario to produce a joint split on a tool parameter and one or moreevents related to use of that tool. For example, a tool may be testedfor its particle counts using a test wafer periodically, such as on adaily basis. Because high particle counts can cause yield loss,periodically obtaining particle counts for the tool provides usefulinformation. To identify this type of problem, a joint split on the tooland one or more related events, such as particle count measurements, isappropriate.

Accordingly, similar to the earlier described method for consideringtool and date parameters jointly for splits, the semiconductor yieldmanagement system and method in accordance with one embodiment of thepresent invention provide a method, preferably available as an optionfor selection by the user, to consider tool and related event parametersjointly for splits. Thus, the model considers more than one parameter ata time by considering the tool and a related event measurement together.

To select the method for considering tool and related event parametersjointly for splits, the user positions the mouse pointer on a “Considertool and event jointly for splits” check box 150 shown in FIG. 8 andclicks the left mouse button. When the consider tool and event jointlyfor splits method is invoked, the semiconductor yield management systemand method in accordance with one embodiment of the present inventionnot only consider each parameter, but also the tool parameter and itsrelated event parameter together as a split candidate. Because a jointsplit involves two parameters, the relative score for the joint splitversus other splits is adjusted by a threshold. One distinction comparedto the earlier described method for considering tool and date parametersjointly is the tool may be associated with multiple events. In thiscase, the semiconductor yield management system and method in accordancewith one embodiment of the present invention will consider pairing thetool parameter with each event measurement when building the model. Now,the method for maximizing class distinction for splits in accordancewith one embodiment of the present invention will be described.

When a response variable is categorical, sometimes the user would liketo build a model based on a particular class of the response variable,for example, the class corresponding to lots with bad yield. Toaccomplish the building of the model, the semiconductor yield managementsystem and method in accordance with one embodiment of the presentinvention provide a method, preferably available as an option forselection by the user, to select a method to maximize class distinctionto produce splits.

To select the method for maximizing class distinction, the userpositions the mouse pointer on a “Maximize Class Distinction” check box160 shown in FIG. 8 and clicks the left mouse button. To select a class,the user additionally positions the mouse pointer on a “Class” box 162and clicks the left mouse button. The class or classes that areavailable for selection by the user appear in a scroll-down list in the“Class” box 162, as shown in FIG. 8.

When the maximize class distinction method is invoked, the semiconductoryield management system and method in accordance with the presentinvention build a model based on splits that provide the greatestdistinctions of the class selected by the user. For example, suppose adata set contains 100 “good” lots and 20 “bad” lots. A split, S,partitions the data set into two subsets. The first subset contains 90“good” lots and 18 “bad” lots. The second subset contains 10 “good” lotsand two “bad” lots. This type of split generally does not receive a highscore from the semiconductor yield management system, because thedistribution of “good” lots and “bad” lots is the same, namely, a 5:1ratio for each subset. When the maximize class distinction method isinvoked, and the user selects the “bad” lots as the class, the previoussplit receives a high score, because the system is now concentrating onsplitting the “bad” lots, and the split produces a separation of 18 totwo, which increases the ratio to 9:1. Now, the method for preferringsimple splits in accordance with one embodiment of the present inventionwill be described.

When the prediction variable is a categorical variable with k classes,the number of possible splits is 2^((k−1))−1. For example, if aparameter has the following eight classes, {A, B, C, D, E, F, G, H}, thefollowing are three of the 127 possible splits:

-   -   1) {A} vs. {B, C, D, E, F, G, H}    -   2) {A, D} vs. {B, C, E, F, G, H}    -   3) {C, D, F, G} vs. {A, B, E, H}        If the top split is selected, it means using A matters in the        outcome. If the bottom split is selected, it means {C, D, F, G}        as a group is different from {A, B, E, H} as a group.

Now, let N₁ and N₂ denote the number of classes in each of the twosubsets of each exemplary split shown above. Let N=min(N₁, N₂). In theabove example, the N values are 1, 2, and 4, respectively, for the threesplits shown. In practice, splits with smaller N values are simpler toconceptualize than those with greater N values. Therefore, a split witha small N value may be referred to as a simple split. If the userdecides that a simple split is more likely to define an accurate model,and therefore wants to attribute more weight to that type of split, heor she may select the method to prefer simple splits.

When the user selects the method to prefer simple splits, the user ispreferably provided with a range of selections from “Never” prefersimple splits to “Always” prefer simple splits, provided by a radio dialbox 204 shown in FIG. 8. The user may position the mouse pointer at aselected location in the radio dial box 204 and click the left mousebutton to enter his or her preference for simple splits. On the onehand, “Never” prefer simple splits means all splits are treated equally.On the other hand, “Always” prefer simple splits weights simple splits,so that essentially only splits with N values of 1 are considered. Now,the method for specifying minimum purity in accordance with oneembodiment of the present invention will be described.

If a node is pure (i.e., all the cases in the node have the sameresponse A_(j)), then, f(T)=A_(j). Otherwise, the node is not pure.

When the response variable is a categorical variable, each terminal nodeof the decision tree has its own response variable distribution. Forexample, if the response variable contains two classes, A and B, aterminal node consisting of 100 cases may have 70 cases belonging toclass A, and the remaining 30 belonging to class B. Consequently, thedistribution for this terminal node is {0.7, 0.3}. In some situations,the user may only be interested in a model which will show a highconcentration of a certain class, for example, more than 90% of thecases must belong to class A. To achieve this result, the semiconductoryield management system and method in accordance with one embodiment ofthe present invention provide a method, preferably available as anoption for selection by the user, to specify minimum purity.

When the user selects the minimum purity method, the user chooses aclass of interest and sets a threshold for the chosen class. In order toset a threshold, the user positions the mouse pointer on a “MinimumPurity (%)” box 206 shown in FIG. 8 and clicks the left mouse button.The user enters a purity value as a percentage by positioning the mousepointer on the up/down arrows adjacent the “Minimum Purity (%)” box 206and clicking the left mouse button to select the threshold value, or byentering a value in the box using the numerical keys on the keyboard 20.The user also positions the mouse pointer on the “Purity Class” box 208shown in FIG. 8 and clicks the left mouse button and highlights thechosen response variable in a scroll-down list that appears. Forexample, the response variable may contain two classes, “good yield” and“bad yield”. The user can select “bad yield” as the class of interestand a threshold value 80%. In this case, the semiconductor yieldmanagement system and method in accordance with the present inventionconsider a split valid if and only if at least one of the sub-nodes fromthe split has a distribution of more than 0.8 in the “bad yield” class.Now, the method for parameter weighting in accordance with oneembodiment of the present invention will be described.

The knowledge of a user respecting what types of parameters are thelikely cause of a yield problem may be helpful in building the correctmodel. In order to facilitate incorporating the knowledge of the userrespecting the significance of various parameters, the preferredembodiment of the semiconductor yield management system and method inaccordance with the present invention additionally provides a method,preferably available as an option for selection by the user, to weightone or more parameters.

The user selects the parameter weighting method by positioning the mousepointer on a “Weighting File” button 212 shown in FIG. 8 and clickingthe left mouse button to cause the overlying window shown in FIG. 15 toappear. The user may set a weight for each parameter in each text filethat appears in the window shown in FIG. 15. The semiconductor yieldmanagement system and method in accordance with the present inventionpreferably set the default weight value to 1. In this way, the user onlyneeds to adjust parameters with weights different from 1.

When the user invokes the parameter weighting method, the user mayhighlight a parameter by positioning the mouse pointer on the parameterappearing in the overlying window shown in FIG. 15 and clicking the leftmouse button. The user next positions the mouse pointer on an “Open”button 214 and then clicks the left mouse button to open the parameterfile. In the text file for the parameter, each line preferably has thefollowing format:

-   -   Weight X Pattern,        where:

Weight is a real value;

X is either R (a regular expression) or S/s (substring matching, with Sfor case insensitive and s for case sensitive); and

Pattern is the string which the parameter names are matched against.

An example of parameter weighting is as follows:

-   -   2 R tool        The above expression means that all parameters containing the        string “tool” have a weight of 2. When the semiconductor yield        management system and method in accordance with the present        invention determine which variable to split, they calculate an        internal score for each parameter based on its statistical        significance. Then, this score is multiplied by its weight to        obtain its overall score. Preferably, the parameter with the        highest overall score is determined to be the split parameter.        Now, the method for specifying minimum group size in accordance        with one embodiment of the present invention will be described.

Typically, a node is split when results of the partition produce twosub-nodes with significantly different response variable distributions.However, a split may have little practical value when the number ofcases in the node is below a predetermined threshold. The semiconductoryield management system and method in accordance with a preferredembodiment of the present invention enable a user to set this thresholdusing a method, preferably available as an option for selection by theuser, to specify minimum group size.

In order to invoke the minimum group size method, the user positions themouse pointer on a “Minimum Group Size” threshold entry box 216 shown inFIG. 8. The user positions the mouse pointer on the up/down arrows andclicks the left mouse button, or enters a value in the box 216 using thenumerical keys on the keyboard 20, to select the threshold value. When anode contains fewer cases than the selected threshold value, nofollowing split is considered. This maintains the output clean and savestime in building the model. Now, the method for specifying the maximumnumber of descendants in accordance with one embodiment of the presentinvention will be described.

In the majority of real cases, yield loss is typically caused by asingle factor. The top split is generally the most important split. Auser may not care about splits after a predetermined split level. Inorder to control the number of split levels in building the model, thepreferred embodiment of the semiconductor yield management system andmethod in accordance with the present invention provides a method,preferably available as an option for selection by the user, to specifythe maximum number of descendants.

The user invokes the method for specifying the maximum number ofdescendants by positioning the mouse pointer on a “Maximum # ofDescendants” cut-off level entry box 218 shown in FIG. 8. The userpositions the mouse pointer on the up/down arrows and clicks the leftmouse button, or enters a value in the box 218 using the numerical keyson the keyboard 20, to select the predetermined cut-off level. Duringmodel building, when the decision tree reaches the predetermined cut-offlevel, no additional subsequent splits are generated. Now, the methodfor enabling raw data mapping in accordance with one embodiment of thepresent invention will be described.

Occasionally, data are binned before a model is built. However, a usermay want to validate the model results with the raw data, instead of thebinned data, in the follow-up analysis. One embodiment of thesemiconductor yield management system and method in accordance with thepresent invention provides a method, preferably available as an optionfor selection by the user, to enable raw data mapping.

The user selects one or more variables for raw data mapping byhighlighting the variables in a “Raw Data Mapping” scroll-down list 220shown in FIG. 8 by positioning the mouse pointer on each selectedvariable and clicking the left mouse button. When the user invokes theraw data mapping method, the binned variable, which is treated as acategorical variable, is linked to its original form. This enables theuser to plot the variable as a continuous variable, and examine itscorrelation with a continuous prediction variable using analysis tools,such as regression, or the like.

The various embodiments of the yield management system and method inaccordance with the present invention also provide several additionalmethods for selection by a user. The first method provides a split rulereferred to as the new cut rule method, and the second method is used inmodel building and is referred to as the generate multiple modelssimultaneously method. These two methods will now be described indetail, beginning with the new cut rule method.

By way of background, once a parameter is identified as the splitparameter, the split rule produced by conventional yield managementsystems is typically based on statistical significance. Underlying eachyield problem, there is a real cause. Occasionally, the split ruleproduced by conventional yield management systems may be inaccurate dueto noise present in the data. For example, FIG. 16 shows a binary splitat the node “267031 N-LDD1_PH_TrackOut_Date<08/24/2001 06:35:00 PM”.Assume, however, that the user has knowledge that the tool wasmaintained on “05/25/2001”. Consequently, it is probable that theproblem actually occurred on the maintenance date. The yield managementsystem and method in accordance with the present invention preferablyenable the user to adjust the model using the new cut rule method.

In order to invoke the new cut rule method, the user positions the mousepointer on the displayed split rule, for example, “267031N-LDD1_PH_TrackOut_Date<0/24/2001 06:35:00 PM” shown in FIG. 16, andclicks the left mouse button to pop up the menu shown in FIG. 17. Asshown in FIG. 17, the pop-up menu includes a selection labeled “NewCut-Point”. The user positions the mouse pointer on “New Cut-Point” andclicks the left mouse button to display the window shown in FIG. 18.

When the user invokes the new cut rule method, the format of the splitrule depends on whether the prediction variable is continuous orcategorical. On the one hand, if the prediction variable is continuous,there are three types of split formats from which the user may select.The available split formats are 1) a default type (a≦X), as indicated bythe numeral 161 shown in FIG. 18; 2) a range type (a1≦X<a2), asindicated by the numeral 162 shown in FIG. 18; and 3) a linear type(X<a1, X in [a1, a2], X in [a2, a3], X>a3), as indicated by the numeral164 shown in FIG. 18. These different split formats facilitate the userbeing able to produce an accurate model. On the other hand, if theprediction variable is categorical, when the user positions the mousepointer on the “New Cut-Point” selection and clicks the left mousebutton, the window shown in FIG. 19 is displayed. The user may selectany combination of classes of the variable and include them in onesub-node. The remainder of the data is included in the other sub-node.

Referring again to FIG. 17, another selection in the pop-up menu is “NewSplit Rule”. The user positions the mouse pointer on “New Split Rule”and clicks the left mouse button to display the window shown in FIG. 20.When the user selects a new split rule, the split rules for the top Nscored parameters are displayed in the new split rule setup screen, asshown in FIG. 20.

The user may select the number of alternate split rules to be displayedfrom the setup screen shown in FIG. 8 by selecting Edit→Editpreferences→Analysis→YieldMine to display the window shown in FIG. 21.If the user elects to have the split rules for a different number of thetop scored parameters displayed, the user positions the mouse pointer ona “Display Top Alternate Split Rules” box 166 shown in FIG. 21 andclicks the left mouse button. The user enters a number for the top Nscored parameters for which the split rules are to be displayed bypositioning the mouse pointer on the up/down arrows adjacent the“Display Top Alternate Split Rules” box 166 and clicking the left mousebutton to enter a number, or by entering a number in the box using thenumerical keys on the keyboard 20, to select the number of the topscored parameters to be displayed to provide the user a quick view ofthe alternative splits without having to build new decision trees basedon those parameters.

When a terminal node is reached following application of all of thesplit rules, a value or a class, f(T), is assigned to all cases in thenode depending on the type of the response variable. If the type of theresponse variable is numerical, f(T) is a real value number. Otherwise,f(T) is set to be a class member of the set A={A₁, A₂, . . . , A_(k)}.

There are situations in which the cause of a yield problem is notreadily apparent, so the user wants to investigate more than oneparameter to determine which parameter is the cause of the yieldproblem. In this case, the user may invoke the method to generatemultiple models simultaneously, so that the yield management system andmethod in accordance with one embodiment of the present invention buildmore than one model.

In order to invoke the generate multiple models simultaneously method,the user positions the mouse pointer on “New Split Rule” in the pop-upmenu shown in FIG. 17 and clicks the left mouse button to display thewindow shown in FIG. 20. The user may choose a group of parameters forthe model building by highlighting the selected parameters in thescroll-down list shown in FIG. 20. The user also positions the mousepointer on the “Create new tree for each new split rule selected” box168 shown in FIG. 20 and clicks the left mouse button. The yieldmanagement system and method in accordance with one embodiment of thepresent invention then generate a model for each of the parametersselected by the user. FIG. 22 shows an example of the results.Consequently, instead of building one model at a time, the yieldmanagement system and method in accordance with one embodiment of thepresent invention produce a plurality of models if the user has invokedthe generate multiple models simultaneously method.

Additional embodiments of the yield management system and method inaccordance with the present invention enable the user to select variousinput/output methods, including a redisplay setup window method andcollapse/expand sub-nodes methods, for convenience. These input/outputmethods will now be described in more detail, beginning with theredisplay setup window method.

Occasionally, setting up all of the options and selecting all of theprediction variables from a data set on which the yield managementsystem and method in accordance with one embodiment of the presentinvention build the model is time consuming. In order to invoke theredisplay setup window method, the user positions the mouse pointer onthe display and clicks the right mouse button to pop up the menucontaining “Re-display Setup Dialog” shown in FIG. 23. The userpositions the mouse pointer on “Re-Display Setup Dialog” and clicks theleft mouse button to display the window shown in FIG. 8. If the userthen decides to modify the setup, instead of requiring the user to enterall of the requisite selections again, one embodiment of the yieldmanagement system and method in accordance with the present inventionenables the user to quickly modify his or her previous selections.

Finally, another embodiment of the yield management system and method inaccordance with the present invention preferably enables every node onthe decision tree to be collapsed. Referring again to FIG. 17, anothermenu selection in the pop-up menu is “Collapse Sub-Nodes”. In order toinvoke the “Collapse Sub-Nodes” method, the user positions the mousepointer on “Collapse Sub-Nodes” and clicks the left mouse button. Afterthe user selects the “Collapse Sub-Nodes” method, the menu selectionautomatically toggles to “Expand Sub-Nodes”, as shown in FIG. 24.Preferably, the user can collapse or expand a node from the decisiontree output by simply clicking on the node and selecting the“Collapse/Expand Sub-Nodes” methods. The user may invoke the“Collapse/Expand Sub-Nodes” methods to collapse the node when the userdecides that the split of the node is unnecessary or, alternatively, toexpand the node when the user wants to examine the aggregate statisticsof the entire subset. The “Expand Sub-Nodes” method may also be invokedby the user to expand a previously collapsed node, so that the nodereturns to its original length.

Preferably, statistical analysis tools are available to help the user tovalidate the model and identify the yield problem. At each node, a rightclick of the mouse 22 shown in FIG. 1 produces a list of availableanalysis tools in a window, as shown in FIG. 25. Every analysis is doneat the node level (i.e., it only uses the data from that particularnode). An example of the analysis tools available at the right nodeafter the first split is shown in FIG. 25. In this example, thoseanalysis tools may include box-whisker chart, Cumsum control chart,Shewhet control chart, histogram, one-way ANOVA, two sample comparison,and X-Y correlation analysis, which are well-known to persons skilled inthe art. The particular tools available to the user depend upon thenature of the X and Y parameters (e.g., continuous versus categorical).

After each model is built, the decision tree can be saved for futurepredictions. If a new set of parameter values is available, it can befed into the model and generate prediction of the response value foreach case.

While the foregoing description has been with reference to particularembodiments of the present invention, it will be appreciated by thoseskilled in the art that changes in these embodiments may be made withoutdeparting from the principles and spirit of the invention, the scope ofwhich is defined by the appended claims.

1. A yield management system, comprising: means for processing an inputdata set comprising one or more prediction variables and one or moreresponse variables containing data about a particular semiconductorprocess, the processing means comprising tiered splitting means wherein:the tiered splitting means enables user selection of at least oneprediction variable to generate processed data; and means for generatinga model based on the processed data.
 2. The system of claim 1 whereinthe model is a decision tree.
 3. The system of claim 1, furthercomprising means for analyzing the model using a statistical tool togenerate one or more key yield factors based on the input data set. 4.The system of claim 1 wherein the tiered splitting means furthercomprises means for enabling user selection of a predetermined value forthe selected prediction variable and means for removing data containedin the input data set for which the selected prediction variable hasmissing values and values different from the predetermined value, togenerate the processed data.
 5. A yield management system, comprising:means for processing an input data set comprising one or more predictionvariables and one or more response variables containing data about aparticular semiconductor process, the processing means comprisingauto-categorization means wherein: the auto-categorization means enablesuser selection for binning at least one response variable contained inthe input data set into a class; and means for generating a model basedon the processed data.
 6. The system of claim 5 wherein the model is adecision tree.
 7. The system of claim 5, further comprising means foranalyzing the model using a statistical tool to generate one or more keyyield factors based on the input data set.
 8. The system of claim 5wherein the auto-categorization means enables user selection for binningthe at least one response variable contained in the input data set intoa class using data clustering.
 9. The system of claim 8 wherein theauto-categorization means further enables the user to enter a number ofcategories to determine if small clusters are to be excluded.
 10. Thesystem of claim 8 wherein the data clustering is performed using anearest neighbor methodology.
 11. The system of claim 8 wherein theauto-categorization means comprises means to provide a preview to enablethe user to view results and make adjustments.
 12. A yield managementsystem, comprising: means for processing an input data set comprisingone or more prediction variables and one or more response variablescontaining data about a particular semiconductor process, the processingmeans comprising outlier filtering means to enable user selection of afilter for removing response variables contained in the input data set,the outlier filtering means comprising one or more of the followingfilters: 1) Mean±N*std, wherein:${{Mean} = {\sum\limits_{i = 1}^{n}\quad{x_{i}/n}}},{{std} = {\sqrt{\sum\limits_{i = 1}^{n}\quad\left( {x_{i} - {Mean}} \right)^{2}}/\left( {n - 1} \right)}},{and}$ N is a threshold value selected by the user, whereby the system removesdata outside the range of Mean±N*std; and 2) Median±N*MAD, wherein:${MAD} = {\sum\limits_{i = 1}^{n}\quad{{{x_{i} - {Mean}}}/{n.}}}$ 13.A yield management system, comprising: means for processing an inputdata set comprising one or more prediction variables and one or moreresponse variables containing data about a particular semiconductorprocess, the processing means comprising tool usage parameter means toidentify from prediction variables in the input data set a number oftimes that each tool is used during the semiconductor fabricationprocess, the tool usage parameter means determining a number that equalsthe number of times that each tool is used in each case contained in theinput data set for the semiconductor fabrication process under analysisand producing an additional variable for each case having a value equalto the number; and means for generating a model based on the processeddata.
 14. A yield management system, comprising: means for processing aninput data set comprising one or more prediction variables and one ormore response variables containing data about a particular semiconductorprocess, the processing means comprising treat integer as categoricalmeans to designate an integer corresponding to a response variable as acategorical variable.
 15. The system of claim 14 wherein the treatinteger as categorical means enables a user to selectively designate aresponse variable in a list as a categorical variable.
 16. A yieldmanagement system, comprising: means for processing an input data setcomprising one or more prediction variables and one or more responsevariables containing data about a particular semiconductor process; andmeans for generating a model based on the processed data, the modelbeing a decision tree, wherein the model generating means comprises:means for generating a linear type split for use in constructing themodel comprising means to identify that a response variable and aprediction variable have a linear relationship; and means to construct adecision tree having a predetermined number of sub-nodes using a fittedregression line, the predetermined number of sub-nodes being greaterthan two
 17. The system of claim 16, further comprising means foranalyzing the model using a statistical tool to generate one or more keyyield factors based on the input data set.
 18. A yield managementsystem, comprising: means for processing an input data set comprisingone or more prediction variables and one or more response variablescontaining data about a particular semiconductor process; and means forgenerating a model based on the processed data, the model being adecision tree, wherein the model generating means comprises: means forgenerating a range type split using a split rule of the form a1≦X<a2,where X is a variable and a1 and a2 are real numbers.
 19. The system ofclaim 18, further comprising means for analyzing the model using astatistical tool to generate one or more key yield factors based on theinput data set.
 20. A yield management system, comprising: means forprocessing an input data set comprising one or more prediction variablesand one or more response variables containing data about a particularsemiconductor process; and means for generating a model based on theprocessed data, the model being a decision tree, wherein the modelgenerating means comprises means for providing user control informulating rules for splitting nodes of the decision tree comprising atleast one of: means for considering tool and date parameters jointly,whereby a tool parameter and its corresponding date parameter areconsidered together as a split candidate; means for considering tool andevent parameters jointly, whereby a tool parameter and a related eventare considered together as a split candidate; means for consideringmaximum class distinction, whereby the model generating means builds themodel based on a split that provides the greatest distinction of a classof categorical response variable; means for parameter weighting toweight one or more variables, whereby the model generating meanscalculates an internal score for each variable based on its statisticalsignificance and multiplies the score by its weight to obtain an overallscore in determining a split parameter; means for preferring simplesplits of one or more categorical variables responsive to user selectionfrom a range of preference values; means for specifying minimum purityresponsive to user selection of 1) a class of interest, 2) a thresholdvalue for the selected class, and 3) a response variable, wherein purityis defined as all the cases in a node having the same response; meansfor specifying minimum group size responsive to user selection of athreshold value, whereby the model generating means does not consider afurther split when a node contains fewer cases than the selectedthreshold value; means for specifying a maximum number of descendants inresponse to selection by the user of a predetermined cut-off level,whereby the model generating means does not generate subsequent splitswhen the decision tree reaches the predetermined cut-off level; andmeans for raw data mapping to link a binned variable, which is treatedas a categorical variable, to its original form.
 21. The system of claim20, further comprising means for analyzing the model using a statisticaltool to generate one or more key yield factors based on the input dataset.
 22. A yield management system, comprising: means for processing aninput data set comprising one or more prediction variables and one ormore response variables containing data about a particular semiconductorprocess; and means for generating a model based on the processed data,the model being a decision tree, wherein the model generating meanscomprises means for providing user control for splitting nodes of thedecision tree comprising means for applying a new cut rule, whereby: ifthe variable is categorical, the user may select any combination ofclasses of the variable and include them in a first sub-node, theremainder of the data being included in a second sub-node; and if thevariable is continuous, applying one of the following split formatsresponsive to user selection: 1) a default type of the form a≦X; 2) arange type of the form a1≦X<a2; and 3) a linear type of the form X<a1, Xin [a1, a2], X in [a2, a3], X>a3), wherein X is the continuous variableand a, a1, a2, and a3 are real numbers.
 23. The system of claim 22,further comprising means for analyzing the model using a statisticaltool to generate one or more key yield factors based on the input dataset.
 24. A yield management system, comprising: means for processing aninput data set comprising one or more prediction variables and one ormore response variables containing data about a particular semiconductorprocess; and means for generating one or more models based on theprocessed data, wherein the model generating means is responsive to userselection of a group of variables for the model building tosimultaneously generate a model for each of the variables selected bythe user.
 25. The system of claim 24 wherein the model is a decisiontree.
 26. The system of claim 24, further comprising means for analyzingthe model using a statistical tool to generate one or more key yieldfactors based on the input data set.
 27. A yield management system,comprising: means for processing an input data set comprising one ormore prediction variables and one or more response variables containingdata about a particular semiconductor process; means for generating amodel based on the processed data, the model being a decision tree;means for modifying the model based on user input; means for analyzingthe model using a statistical tool to generate one or more key yieldfactors based on the input data set; and means for redisplaying a setupmeans to enable a user to modify previous selections
 28. A yieldmanagement system, comprising: means for processing an input data setcomprising one or more prediction variables and one or more responsevariables containing data about a particular semiconductor process;means for generating a model based on the processed data, the modelbeing a decision tree; means for modifying the model based on userinput; means for analyzing the model using a statistical tool togenerate one or more key yield factors based on the input data set; andmeans for collapsing a node responsive to user selection when the userdecides that the split of the node is unnecessary and for expanding thenode responsive to user selection when the user wants to examineaggregate statistics.
 29. The system of claim 28 wherein the means forexpanding the node expands a previously collapsed node, so that the nodereturns to its original length.
 30. A yield management method,comprising: processing an input data set comprising one or moreprediction variables and one or more response variables containing dataabout a particular semiconductor process, the processing comprisingtiered splitting wherein: the tiered splitting enables user selection ofa prediction variable to generate processed data; and generating a modelbased on the processed data.
 31. The method of claim 30 wherein themodel is a decision tree.
 32. The method of claim 30, further comprisinganalyzing the model using a statistical tool to generate one or more keyyield factors based on the input data set.
 33. The system of claim 30wherein the tiered splitting further comprises enabling user selectionof a predetermined value for the selected prediction variable andremoving data contained in the input data set for which the selectedprediction variable has missing values and values different from thepredetermined value, to generate the processed data.
 34. A yieldmanagement method, comprising: processing an input data set comprisingone or more prediction variables and one or more response variablescontaining data about a particular semiconductor process, the processingcomprising auto-categorizing wherein: auto-categorizing enables userselection for binning at least one response variable contained in theinput data set into a class; and generating a model based on theprocessed data.
 35. The method of claim 34 wherein the model is adecision tree.
 36. The method of claim 34, further comprising analyzingthe model using a statistical tool to generate one or more key yieldfactors based on the input data set.
 37. The method of claim 34 whereinauto-categorizing enables user selection for binning at least oneresponse variable contained in the input data set into a class usingdata clustering
 38. The method of claim 37 wherein the auto-categorizingfurther enables the user to enter a number of categories to determine ifsmall clusters are to be excluded.
 39. The method of claim 37 whereinthe data clustering is performed using a nearest neighbor methodology.40. The method of claim 37 wherein the auto-categorizing comprisesproviding a preview to enable the user to view results and makeadjustments.
 41. A yield management method, comprising: processing aninput data set comprising one or more prediction variables and one ormore response variables containing data about a particular semiconductorprocess, the processing comprising outlier filtering to enable userselection of a filter for removing response variables contained in theinput data set comprising one or more of the following filters: 1)Mean±N*std, wherein:${{Mean} = {\sum\limits_{i = 1}^{n}\quad{x_{i}/n}}},{{std} = {\sqrt{\sum\limits_{i = 1}^{n}\quad\left( {x_{i} - {Mean}} \right)^{2}}/\left( {n - 1} \right)}},{and}$ N is a threshold value selected by the user, whereby the method removesdata outside the range of Mean±N*std; and 2) Median±N*MAD, wherein:${MAD} = {\sum\limits_{i = 1}^{n}\quad{{{x_{i} - {Mean}}}/{n.}}}$ 42.A yield management method, comprising: processing an input data setcomprising one or more prediction variables and one or more responsevariables containing data about a particular semiconductor process, theprocessing comprising identifying from prediction variables in the inputdata set a number of times that each tool is used during thesemiconductor fabrication process to determine a number that equals thenumber of times that each tool is used in each case contained in theinput data set for the semiconductor fabrication process under analysisand producing an additional variable for each case having a value equalto the number; and generating a model based on the processed data.
 43. Ayield management method, comprising: processing an input data setcomprising one or more prediction variables and one or more responsevariables containing data about a particular semiconductor process, theprocessing comprising designating an integer corresponding to a responsevariable as a categorical variable responsive to user selection.
 44. Themethod of claim 43, further comprising enabling a user to selectivelydesignate a variable in a list as a categorical variable.
 45. A yieldmanagement method, comprising: processing an input data set comprisingone or more prediction variables and one or more response variablescontaining data about a particular semiconductor process; and generatinga model based on the processed data, the model being a decision tree,wherein the model generating comprises: generating a linear type splitfor use in constructing the model to identify that a response variableand a prediction variable have a linear relationship and to constructthe decision tree having a predetermined number of sub-nodes using afitted regression line, the predetermined number of sub-nodes beinggreater than two.
 46. The method of claim 45, further comprisinganalyzing the model using a statistical tool to generate one or more keyyield factors based on the input data set.
 47. A yield managementmethod, comprising: processing an input data set comprising one or moreprediction variables and one or more response variables containing dataabout a particular semiconductor process; and generating a model basedon the processed data, the model being a decision tree, wherein themodel generating comprises: generating a range type split using a splitrule of the form a1≦X<a2, where X is a variable and a1 and a2 are realnumbers.
 48. The method of claim 47, further comprising analyzing themodel using a statistical tool to generate one or more key yield factorsbased on the input data set.
 49. A yield management method, comprising:processing an input data set comprising one or more prediction variablesand one or more response variables containing data about a particularsemiconductor process; generating a model based on the processed data,the model being a decision tree, wherein the model generating comprisesproviding user control in formulating rules for splitting nodes of thedecision tree comprising at least one of: considering tool and dateparameters jointly, whereby a tool parameter and its corresponding dateparameter are considered together as a split candidate; considering tooland event parameters jointly, whereby a tool parameter and a relatedevent are considered together as a split candidate; considering maximumclass distinction, whereby generating a model builds the model based ona split that provides the greatest distinction of a class of categoricalresponse variable; weighting one or more variables, whereby the modelgenerating calculates an internal score for each variable based on itsstatistical significance and multiplies the score by its weight toobtain an overall score in determining a split parameter; preferringsimple splits of one or more categorical variables responsive to userselection from a range of preference values; specifying minimum purityresponsive to user selection of 1) a class of interest, 2) a thresholdvalue for the selected class, and 3) a response variable, wherein purityis defined as all the cases in a node having the same response;specifying minimum group size responsive to user selection of athreshold value, whereby the model generating does not consider afurther split when a node contains fewer cases than the selectedthreshold value; specifying a maximum number of descendants in responseto selection by the user of a predetermined cut-off level, whereby themodel generating does not generate subsequent splits when the decisiontree reaches the predetermined cut-off level; and mapping to link abinned variable, which is treated as a categorical variable, to itsoriginal raw data form.
 50. The method of claim 49, further comprisinganalyzing the model using a statistical tool to generate one or more keyyield factors based on the input data set.
 51. A yield managementmethod, comprising: processing an input data set comprising one or moreprediction variables and one or more response variables containing dataabout a particular semiconductor process; and generating a model basedon the processed data, the model being a decision tree, wherein themodel generating comprises providing user control for splitting nodes ofthe decision tree comprising applying a new cut rule, whereby: if thevariable is categorical, the user may select any combination of classesof the variable and include them in a first sub-node, the remainder ofthe data being included in a second sub-node; and if the variable iscontinuous, applying one of the following split formats responsive touser selection: 1) a default type of the form a≦X; 2) a range type ofthe form a1≦X<a2; and 3) a linear type of the form X<a1, X in [a1, a2],X in [a2, a3], X>a3), wherein X is the continuous variable and a, a1,a2, and a3 are real numbers.
 52. The method of claim 51, furthercomprising analyzing the model using a statistical tool to generate oneor more key yield factors based on the input data set.
 53. A yieldmanagement method, comprising: processing an input data set comprisingone or more prediction variables and one or more response variablescontaining data about a particular semiconductor process; and generatingone or more models based on the processed data, wherein the modelgenerating is responsive to user selection of a group of variables forthe model building to simultaneously generate a model for each of thevariables selected by the user.
 54. The method of claim 53 wherein themodel is a decision tree.
 55. The method of claim 53, further comprisinganalyzing the model using a statistical tool to generate one or more keyyield factors based on the input data set.
 56. A yield managementmethod, comprising: processing an input data set comprising one or moreprediction variables and one or more response variables containing dataabout a particular semiconductor process; generating a model based onthe processed data, the model being a decision tree; modifying the modelbased on user input; analyzing the model using a statistical tool togenerate one or more key yield factors based on the input data set; andredisplaying a setup to enable a user to modify previous selections. 57.A yield management method, comprising: processing an input data setcomprising one or more prediction variables and one or more responsevariables containing data about a particular semiconductor process;generating a model based on the processed data, the model being adecision tree; modifying the model based on user input; analyzing themodel using a statistical tool to generate one or more key yield factorsbased on the input data set; and collapsing a node responsive to userselection when the user decides that the split of the node isunnecessary and expanding the node responsive to user selection when theuser wants to examine aggregate statistics.
 58. The method of claim 57wherein expanding the node expands a previously collapsed node, so thatthe node returns to its original length.